HMMs and GMMs based methods for acoustic - to - articulatory speech inversion
نویسندگان
چکیده
Afin de récupérer les mouvements des articulateurs tels que les lèvres, la mâchoire ou la langue, à partir du son de parole, nous avons développé et comparé deux méthodes d’inversion basées l’une sur les modèles de Markov cachés (HMMs) et l’autre sur les modèles de mélanges de gaussiennes (GMMs). Les mouvements des articulateurs sont caractérisés par les coordonnées médiosagittales de bobines d’un articulographe électromagnétique (EMA) fixées sur les articulateurs. Dans la première méthode, des HMMs à deux flux, acoustique et articulatoire, sont entrainés à partir de signaux acoustique et articulatoire synchrones. Le HMM acoustique sert à reconnaitre les phones, ainsi que leurs durées. Ces informations sont ensuite utilisées par le HMM articulatoire pour synthétiser les trajectoires articulatoires. Pour la deuxième méthode, un GMM d’association directe entre traits acoustiques et articulatoires est entrainé sur le même corpus suivant le critère de minimum d’erreur quadratique moyenne (MMSE) à partir des trames acoustiques d’empan temporel plus ou moins grand. Pour un corpus de données EMA mono-locuteur enregistré par un locuteur français, l’erreur RMS de reconstruction sur le corpus de test pour la méthode fondée sur les HMMs se situe entre 1.96 et 2.32 mm, tandis qu’elle se situe entre 2.46 et 2.95 mm pour la méthode basée sur les GMMs.
منابع مشابه
Acoustic-to-articulatory inversion in speech based on statistical models
Two speech inversion methods are implemented and compared. In the first, multistream Hidden Markov Models (HMMs) of phonemes are jointly trained from synchronous streams of articulatory data acquired by EMA and speech spectral parameters; an acoustic recognition system uses the acoustic part of the HMMs to deliver a phoneme chain and the states durations; this information is then used by a traj...
متن کاملCan tongue be recovered from face? the answer of data-driven statistical models
This study revisits the face-to-tongue articulatory inversion problem in speech. We compare the Multi Linear Regression method (MLR) with two more sophisticated methods based on Hidden Markov Models (HMMs) and Gaussian Mixture Models (GMMs), using the same French corpus of articulatory data acquired by ElectroMagnetoGraphy. GMMs give overall results better than HMMs, but MLR does poorly. GMMs a...
متن کاملGeneralized variable parameter HMMs based acoustic-to-articulatory inversion
Acoustic-to-articulatory inversion is useful for a range of related research areas including language learning, speech production, speech coding, speech recognition and speech synthesis. HMM-based generative modelling methods and DNNbased approaches have become dominant approaches in recent years. In this paper, a novel acoustic-to-articulatory inversion technique based on generalized variable ...
متن کاملAcoustic-to-articulatory inversion using speech recognition and trajectory formation based on phoneme hidden Markov models
In order to recover the movements of usually hidden articulators such as tongue or velum, we have developed a data-based speech inversion method. HMMs are trained, in a multistream framework, from two synchronous streams: articulatory movements measured by EMA, and MFCC + energy from the speech signal. A speech recognition procedure based on the acoustic part of the HMMs delivers the chain of p...
متن کاملDeep Neural Network Based Acoustic-to-Articulatory Inversion Using Phone Sequence Information
In recent years, neural network based acoustic-to-articulatory inversion approaches have achieved the state-of-the-art performance. One major issue associated with these approaches is the lack of phone sequence information during inversion. In order to address this issue, this paper proposes an improved architecture hierarchically concatenating phone classification and articulatory inversion co...
متن کامل